Wire torch trace dir through parse pipeline and CLI by FindHao · Pull Request #355 · meta-pytorch/tritonparse

FindHao · 2026-03-03T22:54:19Z

Summary:
Add --torch-trace-dir CLI parameter and wire it through the full pipeline:
unified_parse() -> oss_run() -> parse_logs() -> parse_single_file().

Includes auto-discovery logic: when --torch-trace-dir is not specified,
torch trace log files are automatically searched in the same directory as
tritonparse logs. This enables kernel compile attribution in multi-process
scenarios without requiring explicit user configuration.

Also exports CompileInfo, discover_torch_trace_files, and
parse_torch_trace_logs from the parse module's public API.

Differential Revision: D95080073

Summary: Add a new module `torch_trace_parser.py` that parses inductor's torch trace log files to extract kernel_source_path -> CompileInfo mappings. These mappings will be used to attribute Triton kernels to their originating PyTorch compilation frame when pt_info is missing (e.g., in multi-process Triton JIT compilation scenarios). The parser handles the glog-formatted torch trace log files, finds `inductor_output_code` events, extracts frame_id/frame_compile_id from JSON metadata, and parses `# kernel path:` comments from the output_code payload. Differential Revision: D95080075

Summary: Modify parse_single_file() to accept an optional kernel_compile_mapping parameter. When pt_info is missing from a compilation event, the mapping is used to resolve frame_id/compile_id via python_source.file_path (primary) or stack trace scanning (fallback for fake compilations). Introduces _resolve_compile_info() and _determine_output_fname() helpers to centralize the output filename determination logic for both real and fake compilations. Differential Revision: D95080074

Copilot

Pull request overview

Adds Torch Inductor “torch trace” log support to the parsing pipeline so Triton compilation events missing pt_info (e.g., multi-process JIT) can still be attributed to the originating PyTorch compilation frame.

Changes:

Adds --torch-trace-dir CLI flag and wires it through unified_parse() → oss_run() → parse_logs() → parse_single_file().
Introduces torch_trace_parser.py to discover/parse torch trace logs into kernel_source_path -> CompileInfo mappings.
Uses the mapping in trace_processor.parse_single_file() to pick frame-specific output filenames even when pt_info is absent (including fake compilations), and adds unit/integration tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tritonparse/parse/utils.py	Adds CLI arg and threads `torch_trace_dir` through public entrypoints.
tritonparse/parse/common.py	Builds kernel→compile mapping from torch trace logs and passes it into parsing.
tritonparse/parse/trace_processor.py	Resolves missing `pt_info` via mapping and updates output file routing logic.
tritonparse/parse/torch_trace_parser.py	New parser + discovery utilities for torch trace logs.
tritonparse/parse/init.py	Exposes new torch-trace-related APIs in the parse module.
tests/cpu/test_torch_trace_parser.py	Unit tests for torch trace parsing/discovery.
tests/cpu/test_pipeline_integration.py	Integration tests for mapping build + end-to-end pipeline behavior.
tests/cpu/test_kernel_attribution.py	Tests for mapping-based attribution logic in `trace_processor`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-03T22:58:26Z

tritonparse/parse/torch_trace_parser.py

+    try:
+        for item in os.listdir(search_dir):
+            if TORCH_TRACE_PREFIX not in item:
+                continue
+            if not item.endswith(".log"):
+                continue
+            full_path = os.path.join(search_dir, item)
+            if not os.path.isfile(full_path):
+                continue
+
+            rank_match = rank_pattern.search(item)
+            rank = int(rank_match.group(1)) if rank_match else None
+            result.setdefault(rank, []).append(full_path)
+    except OSError as e:


discover_torch_trace_files() iterates os.listdir() without sorting, so the returned file lists (and rank insertion order) are non-deterministic across platforms/filesystems. This can make downstream mapping/attribution unstable. Consider iterating over sorted(os.listdir(search_dir)) and sorting each result[rank] list before returning.

Copilot · 2026-03-03T22:58:26Z

tritonparse/parse/torch_trace_parser.py

+    for path in log_paths:
+        logger.info(f"Parsing torch trace log: {path}")
+        file_mapping = _parse_torch_trace_log(path)
+        logger.info(f"Extracted {len(file_mapping)} kernel path mappings from {path}")
+        merged.update(file_mapping)


parse_torch_trace_logs() merges per-file mappings via merged.update(file_mapping), which silently overwrites entries when the same kernel path appears in multiple logs. Since the caller may pass paths in non-deterministic order, this can lead to inconsistent attribution. Consider detecting collisions (warn/log) and/or defining a deterministic precedence (e.g., sort paths, keep-first, or keep-highest attempt/frame).

Suggested change

for path in log_paths:

logger.info(f"Parsing torch trace log: {path}")

file_mapping = _parse_torch_trace_log(path)

logger.info(f"Extracted {len(file_mapping)} kernel path mappings from {path}")

merged.update(file_mapping)

# Track which log file a given kernel_source_path was first seen in, so we can

# emit useful diagnostics on collisions.

source_to_logpath: Dict[str, str] = {}

# Process logs in a deterministic order to avoid nondeterministic attribution

# when the caller passes paths in arbitrary order.

for path in sorted(log_paths):

logger.info(f"Parsing torch trace log: {path}")

file_mapping = _parse_torch_trace_log(path)

logger.info(f"Extracted {len(file_mapping)} kernel path mappings from {path}")

# Merge while detecting collisions. Keep the first occurrence of each

# kernel_source_path to avoid silent, order-dependent overwrites.

for kernel_path, compile_info in file_mapping.items():

if kernel_path in merged:

logger.warning(

"Duplicate kernel_source_path '%s' found in multiple torch trace "

"logs ('%s' and '%s'); keeping the first occurrence.",

kernel_path,

source_to_logpath.get(kernel_path, "<unknown>"),

path,

)

continue

merged[kernel_path] = compile_info

source_to_logpath[kernel_path] = path

Copilot · 2026-03-03T22:58:27Z

tritonparse/parse/common.py

+    if torch_trace_dir:
+        search_dirs.append(torch_trace_dir)
+    # Also check the raw log directory (torch trace logs may coexist)
+    search_dirs.append(raw_log_dir)
+
+    all_log_paths: List[str] = []
+    seen_paths: set = set()
+    for search_dir in search_dirs:
+        if not os.path.isdir(search_dir):
+            continue


When torch_trace_dir is explicitly provided but is not a directory (missing/typo), _build_kernel_compile_mapping() currently just skips it silently. That makes the CLI flag easy to misconfigure without noticing. Consider emitting a warning (or raising) when torch_trace_dir is set but os.path.isdir(torch_trace_dir) is false.

Copilot · 2026-03-03T22:58:27Z

tritonparse/parse/common.py

+    all_log_paths: List[str] = []
+    seen_paths: set = set()
+    for search_dir in search_dirs:
+        if not os.path.isdir(search_dir):
+            continue
+        torch_files = discover_torch_trace_files(search_dir)
+        for rank_files in torch_files.values():
+            for path in rank_files:
+                if path not in seen_paths:
+                    all_log_paths.append(path)
+                    seen_paths.add(path)
+
+    if not all_log_paths:
+        return None
+
+    mapping = parse_torch_trace_logs(all_log_paths)


all_log_paths is built from directory scans and dict .values() iteration without any sorting. Because mapping collisions are resolved by last-wins in parse_torch_trace_logs(), the effective attribution can be non-deterministic. Consider sorting all_log_paths before parsing (and/or sorting within discover_torch_trace_files) to make the output stable.

Copilot · 2026-03-03T22:58:27Z

tritonparse/parse/trace_processor.py

+def _resolve_compile_info(
+    event: Dict[str, Any],
+    kernel_compile_mapping: Dict[str, Any],
+) -> Optional[Any]:
+    """


The new attribution helpers type kernel_compile_mapping / return values as Any, but then access .frame_id, .frame_compile_id, etc. This weak typing can hide mistakes and makes the contract unclear (docstring says CompileInfo). Consider importing CompileInfo and using Dict[str, CompileInfo] / Optional[CompileInfo] in type hints to match actual usage.

Summary: Add --torch-trace-dir CLI parameter and wire it through the full pipeline: unified_parse() -> oss_run() -> parse_logs() -> parse_single_file(). Includes auto-discovery logic: when --torch-trace-dir is not specified, torch trace log files are automatically searched in the same directory as tritonparse logs. This enables kernel compile attribution in multi-process scenarios without requiring explicit user configuration. Also exports CompileInfo, discover_torch_trace_files, and parse_torch_trace_logs from the parse module's public API. Differential Revision: D95080073

meta-codesync · 2026-03-03T23:54:48Z

@FindHao has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95080073.

meta-codesync · 2026-03-04T22:02:11Z

This pull request has been merged in 952efc5.

Yueming Hao added 2 commits March 3, 2026 10:07

Copilot AI review requested due to automatic review settings March 3, 2026 22:54

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 3, 2026

Copilot started reviewing on behalf of FindHao March 3, 2026 22:54 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

meta-codesync bot added fb-exported meta-exported labels Mar 3, 2026

FindHao force-pushed the export-D95080073 branch from fc73bb3 to 9f061d9 Compare March 3, 2026 23:54

meta-codesync bot closed this in 952efc5 Mar 4, 2026

facebook-github-tools bot added the Merged label Mar 4, 2026

FindHao mentioned this pull request Mar 20, 2026

incomplete call stack for inductor compiled kernels #180

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wire torch trace dir through parse pipeline and CLI#355

Wire torch trace dir through parse pipeline and CLI#355
FindHao wants to merge 3 commits intomainfrom
export-D95080073

FindHao commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

Copilot AI Mar 3, 2026

Uh oh!

meta-codesync bot commented Mar 3, 2026

Uh oh!

meta-codesync bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    for path in log_paths:
-        logger.info(f"Parsing torch trace log: {path}")
-        file_mapping = _parse_torch_trace_log(path)
-        logger.info(f"Extracted {len(file_mapping)} kernel path mappings from {path}")
-        merged.update(file_mapping)
+    # Track which log file a given kernel_source_path was first seen in, so we can
+    # emit useful diagnostics on collisions.
+    source_to_logpath: Dict[str, str] = {}
+    # Process logs in a deterministic order to avoid nondeterministic attribution
+    # when the caller passes paths in arbitrary order.
+    for path in sorted(log_paths):
+        logger.info(f"Parsing torch trace log: {path}")
+        file_mapping = _parse_torch_trace_log(path)
+        logger.info(f"Extracted {len(file_mapping)} kernel path mappings from {path}")
+        # Merge while detecting collisions. Keep the first occurrence of each
+        # kernel_source_path to avoid silent, order-dependent overwrites.
+        for kernel_path, compile_info in file_mapping.items():
+            if kernel_path in merged:
+                logger.warning(
+                    "Duplicate kernel_source_path '%s' found in multiple torch trace "
+                    "logs ('%s' and '%s'); keeping the first occurrence.",
+                    kernel_path,
+                    source_to_logpath.get(kernel_path, "<unknown>"),
+                    path,
+                )
+                continue
+            merged[kernel_path] = compile_info
+            source_to_logpath[kernel_path] = path

Conversation

FindHao commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Mar 3, 2026

Uh oh!

meta-codesync bot commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants